New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[TDX] Added basic documentation to enable TDX in ChatQnA #1212

Draft

JakubLedworowski wants to merge 6 commits into opea-project:main from JakubLedworowski:enable-intel-tdx-chatqna

JakubLedworowski commented Nov 28, 2024 •

edited

Loading

Description

Confidential computing in AI in the cloud focuses on protecting sensitive data and computations from unauthorized access and tampering. It uses advanced security technologies, such as hardware-based isolation and encryption, to create secure environments where data and AI models can be processed safely. This ensures that even cloud service providers cannot access the data, providing a higher level of privacy and security. By leveraging confidential computing, organizations can confidently use AI in the cloud for tasks that involve sensitive information, such as healthcare data analysis or financial predictions, while complying with strict data protection regulations.

This change introduces the guide on protecting chosen microservices with Intel TDX technology:

added README_tdx.md
added chatqna_tdx.yaml that has all microservices configured with TDX-protection and default settings
described additional steps to run ChatQnA with custom setup

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

n/a

Tests

Manual tests with sample request enabling TDX in all ChatQnA services:
dataprep, embedding, llm, redis, reranking, retriever, tei, teirerank, tgi

JakubLedworowski force-pushed the enable-intel-tdx-chatqna branch from 68149c6 to e47902a Compare

November 28, 2024 08:50

dcmiddle reviewed

View reviewed changes

ChatQnA/kubernetes/intel/README_tdx.md Outdated

+              ### Kubelet Configuration
+              To run a complex and heavy application like OPEA, the cluster administrator must increase the kubelet timeout for container creation, otherwise the pod creation may fail due to timeout `Context deadline exceeded`.
+              This is required because the container creation process can take a long time due to the size of pod images and the need to download the AI models.

Contributor

dcmiddle Dec 6, 2024

Is this timeout change generally required for any k8s deployment? If so should this be added to the main k8s readme?

Author

JakubLedworowski Dec 9, 2024

This is generally required for all use cases where the Container creation takes long time. When TDX is involved, container creation time increases so much that it usually exceeds the default 2 minutes. It is described in k8s docs: https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/

Contributor

dcmiddle Dec 10, 2024

Is that just for peer pods or running CoCo on the host also often breaks 2 minutes?

JakubLedworowski commented

View reviewed changes

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

ChatQnA/kubernetes/intel/README_tdx.md Outdated

+              > [!NOTE]
+              > Running TDX-protected services requires the user to define the pod's resources request (cpu, memory).
+              >
+              > Due to lack of hotplugging feature in TDX, the assigned resources cannot be changed after the pod is scheduled and the resources will not be shared with any other pod.

Author

JakubLedworowski Dec 11, 2024

check hotplugging (TEE-specific? or kata-specific?)

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

JakubLedworowski commented

View reviewed changes

ChatQnA/kubernetes/intel/README_tdx.md Outdated

+              >
+              > After kubelet restart, some of the internal pods from `kube-system` namespace might be reloaded automatically.
+              All kubelet configuration options can be found [here](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/).

Author

JakubLedworowski Dec 11, 2024

remove

JakubLedworowski commented

View reviewed changes

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

ChatQnA/kubernetes/intel/README_tdx.md Outdated

Comment on lines 135 to 140

+              ```bash
+              POD_NAME=$(kubectl get pods | grep 'chatqna-tgi' | awk '{print $1}')
+              kubectl get pod $POD_NAME -o jsonpath='{.spec.runtimeClassName}'
+              ```
+              In the output you should see:

Author

JakubLedworowski Dec 11, 2024

Just show that it is running

ChatQnA/kubernetes/intel/README_tdx.md Outdated Show resolved Hide resolved

JakubLedworowski added 5 commits

December 12, 2024 14:55


          [TDX] Added basic documentation to enable TDX in ChatQnA

a73ce58

- added README_tdx.md
- described steps to run ChatQnA using helm and GMC

Signed-off-by: Jakub Ledworowski <[email protected]>


          [TDX] Improved TDX enabling guide

- Removed deployment option with helm
- Added sample chatqna_tdx.yaml
- Generalized description but left ChatQnA as an example

Signed-off-by: Jakub Ledworowski <[email protected]>


          [TDX] Improve writing

806d993

Signed-off-by: Jakub Ledworowski <[email protected]>


          [TDX] Fixed paths to chatqna.yaml

d7e3771

Signed-off-by: Jakub Ledworowski <[email protected]>


          [TDX] Simplified the descriptions; added Getting Started

608e869

Signed-off-by: Jakub Ledworowski <[email protected]>

JakubLedworowski force-pushed the enable-intel-tdx-chatqna branch from 12e4c94 to 608e869 Compare

December 12, 2024 14:33


          [TDX] Improved description and scripts after review

c48d38d

Signed-off-by: Jakub Ledworowski <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet